On optimization, parallelization and convergence of the Expectation-Maximization algorithm for finite mixtures of Bernoulli distributions
نویسنده
چکیده
This paper reviews the Maximum Likelihood estimation problem and its solution via the Expectation-Maximization algorithm. Emphasis is made on the description of finite mixtures of multi-variate Bernoulli distributions for modeling 0-1 data. General ideas about convergence and non-identifiability are presented. We discuss improvements to the algorithm and describe thoroughly what we believe are novel ideas in the treatment of the topic: 1) identification of unique data points and recycling of that information 2) parallelization of the algorithm in a multi-threaded fashion 3) cluster assignment options. Experiments demonstrate that most of our approaches produce good results and encourage further research on the topic.
منابع مشابه
Global Convergence of Model Reference Adaptive Search for Gaussian Mixtures
While the Expectation-Maximization (EM) algorithm is a popular and convenient tool for mixture analysis, it only produces solutions that are locally optimal, and thus may not achieve the globally optimal solution. This paper introduces a new algorithm, based on the global optimization algorithm Model Reference Adaptive Search (MRAS), designed to produce globally-optimal solutions in the estimat...
متن کاملPractical Identifiability of Finite Mixtures of Multivariate Bernoulli Distributions
The class of finite mixtures of multivariate Bernoulli distributions is known to be nonidentifiable; that is, different values of the mixture parameters can correspond to exactly the same probability distribution. In principle, this would mean that sample estimates using this model would give rise to different interpretations. We give empirical support to the fact that estimation of this class ...
متن کاملBayesian Mixtures of Bernoulli Distributions
The mixture of Bernoulli distributions [6] is a technique that is frequently used for the modeling of binary random vectors. They differ from (restricted) Boltzmann Machines in that they do not model the marginal distribution over the binary data space X as a product of (conditional) Bernoulli distributions, but as a weighted sum of Bernoulli distributions. Despite the non-identifiability of th...
متن کاملMultivariate Structural Bernoulli Mixtures for Recognition of Handwritten Numerals
As shown recently, the structural optimization of probabilistic neural networks can be included into EM algorithm by introducing a special type of multivariate Bernoulli mixtures. However, the underlying loglikelihood criterion is known to be multimodal in case of mixtures and therefore the EM iteration process may be starting-point dependent. In the present paper we discuss the possibility of ...
متن کاملMixture Modeling of DNA Copy Number Amplification Patterns in Cancer
DNA copy number amplifications are hallmarks of many cancers. In this work we analyzed data of genome-wide DNA copy number amplifications collected from more than 4500 neoplasm cases. Based on the 0-1 representation of the data, we trained finite mixtures of multivariate Bernoulli distributions using the EM algorithm to describe the inherent structure in the data. The resulting component distri...
متن کامل